AITopics

2501.074

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

arXiv.org Artificial IntelligenceJul-21-2024

Learning to Compile Programs to Neural Networks

Weber, Logan, Michel, Jesse, Renda, Alex, Carbin, Michael

A $\textit{neural surrogate of a program}$ is a neural network that mimics the behavior of a program. Researchers have used these neural surrogates to automatically tune program inputs, adapt programs to new settings, and accelerate computations. Researchers traditionally develop neural surrogates by training on input-output examples from a single program. Alternatively, language models trained on a large dataset including many programs can consume program text, to act as a neural surrogate. Using a language model to both generate a surrogate and act as a surrogate, however, leading to a trade-off between resource consumption and accuracy. We present $\textit{neural surrogate compilation}$, a technique for producing neural surrogates directly from program text without coupling neural surrogate generation and execution. We implement neural surrogate compilers using hypernetworks trained on a dataset of C programs and find that they produce neural surrogates that are $1.9$-$9.5\times$ as data-efficient, produce visual results that are $1.0$-$1.3\times$ more similar to ground truth, and train in $4.3$-$7.3\times$ fewer epochs than neural surrogates trained from scratch.

initialization, initialization method, training input, (15 more...)

2407.15078

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
Europe > Monaco (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

van Meegen, Alexander, Sompolinsky, Haim

Coding schemes in neural networks learning classification tasks

arXiv.org Machine LearningJun-24-2024

Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate fully-connected, wide neural networks learning classification tasks using the Bayesian framework where learning shapes the posterior distribution of the network weights. Consistent with previous findings, our analysis of the feature learning regime (also known as `non-lazy', `rich', or `mean-field' regime) shows that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity. In linear networks, an analog coding scheme of the task emerges. Despite the strong representations, the mean predictor is identical to the lazy case. In nonlinear networks, spontaneous symmetry breaking leads to either redundant or sparse coding schemes. Our findings highlight how network properties such as scaling of weights and neuronal nonlinearity can profoundly influence the emergent representations.

neuron, posterior, readout weight, (16 more...)

2406.16689

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Chen, Thomas, Ewald, Patricia Muñoz

Geometric structure of Deep Learning networks and construction of global ${\mathcal L}^2$ minimizers

arXiv.org Machine LearningDec-17-2023

In this paper, we provide a geometric interpretation of the structure of Deep Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt) cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension $Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training input size $N$ can be arbitrarily large - thus, we are considering the underparametrized regime. We apply our recent results on shallow neural networks to construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. In the context presented here, the hidden layers of the DL network "curate" the training inputs by recursive application of a truncation map that minimizes the noise to signal ratio of the training inputs. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function. Our constructions make no use of gradient descent algorithms at all.

artificial intelligence, cost function, machine learning, (17 more...)

2309.10639

Country:

North America > United States > Texas > Travis County > Austin (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Huang, Zonghao, Gong, Neil, Reiter, Michael K.

Mendata: A Framework to Purify Manipulated Training Data

arXiv.org Artificial IntelligenceDec-2-2023

Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit. Data purification aims to remove such manipulations prior to training the model. We propose Mendata, a novel framework to purify manipulated training data. Starting from a small reference dataset in which a large majority of the inputs are clean, Mendata perturbs the training inputs so that they retain their utility but are distributed similarly (as measured by Wasserstein distance) to the reference data, thereby eliminating hidden properties from the learned model. A key challenge is how to find such perturbations, which we address by formulating a min-max optimization problem and developing a two-step method to iteratively solve it. We demonstrate the effectiveness of Mendata by applying it to defeat state-of-the-art data poisoning and data tracing techniques.

dataset, detector, mendata, (15 more...)

2312.01281

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.94)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nguyen, Thu, Halvorsen, Pål, Riegler, Michael A.

Imputation using training labels and classification via label imputation

arXiv.org Machine LearningNov-28-2023

Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation of the input. In addition, we propose a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation. This allows imputing the label and the input at the same time. Also, the technique is capable of handling data training with missing labels without any prior imputation and is applicable to continuous, categorical, or mixed-type data. Experiments show promising results in terms of accuracy.

artificial intelligence, imputation, machine learning, (14 more...)

2311.16877

Country: Europe > Norway > Eastern Norway > Oslo (0.05)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)

Chen, Thomas, Ewald, Patricia Muñoz

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

arXiv.org Machine LearningSep-19-2023

In this paper, we provide a geometric interpretation of the structure of shallow neural networks characterized by one hidden layer, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size $N>QM$. We prove an upper bound on the minimum of the cost function of order $O(\delta_P$ where $\delta_P$ measures the signal to noise ratio of training inputs. We obtain an approximate optimizer using projections adapted to the averages $\overline{x_{0,j}}$ of training input vectors belonging to the same output vector $y_j$, $j=1,\dots,Q$. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function; the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes the $Q$-dimensional subspace in the input space ${\mathbb R}^M$ spanned by $\overline{x_{0,j}}$, $j=1,\dots,Q$. We comment on the characterization of the global minimum of the cost function in the given context.

cost function, shallow network, theorem 3, (16 more...)

2309.1037

Country:

North America > United States > Texas > Travis County > Austin (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Lalchand, Vidhi, Tazi, Kenza, Cheema, Talay M., Turner, Richard E., Hosking, Scott

Kernel Learning for Explainable Climate Science

arXiv.org Artificial IntelligenceJul-16-2023

The Upper Indus Basin, Himalayas provides water for 270 million people and countless ecosystems. However, precipitation, a key component to hydrological modelling, is poorly understood in this area. A key challenge surrounding this uncertainty comes from the complex spatial-temporal distribution of precipitation across the basin. In this work we propose Gaussian processes with structured non-stationary kernels to model precipitation patterns in the UIB. Previous attempts to quantify or model precipitation in the Hindu Kush Karakoram Himalayan region have often been qualitative or include crude assumptions and simplifications which cannot be resolved at lower resolutions. This body of research also provides little to no error propagation. We account for the spatial variation in precipitation with a non-stationary Gibbs kernel parameterised with an input dependent lengthscale. This allows the posterior function samples to adapt to the varying precipitation patterns inherent in the distinct underlying topography of the Indus region. The input dependent lengthscale is governed by a latent Gaussian process with a stationary squared-exponential kernel to allow the function level hyperparameters to vary smoothly. In ablation experiments we motivate each component of the proposed kernel by demonstrating its ability to model the spatial covariance, temporal structure and joint spatio-temporal reconstruction. We benchmark our model with a stationary Gaussian process and a Deep Gaussian processes.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

2209.04947

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
Asia > China (0.05)
Asia > Pakistan > Gilgit-Baltistan > Gilgit (0.04)
Asia > Nepal (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Qu, Wenjie, Jia, Jinyuan, Gong, Neil Zhenqiang

REaaS: Enabling Adversarially Robust Downstream Classifiers via Robust Encoder as a Service

arXiv.org Artificial IntelligenceJan-7-2023

Abstract--Encoder as a service is an emerging cloud service. A larger certified radius indicates better certified robustness against adversarial examples. In general, there are two categories of complementary methods to build a certifiably robust classifier and derive In an encoder as a service, a service provider (e.g., OpenAI, its certified radius for a testing input, i.e., base classifier Google, and Amazon) pre-trains a general-purpose feature (BC) based certification [7], [8], [9], [10] and smoothed extractor (called encoder) and deploys it as a cloud service; classifier (SC) based certification (also known as randomized and a client queries the cloud service APIs for the feature smoothing) [11], [12], [13]. BC based certification aims to vectors of its training/testing inputs when training/testing a directly derive the certified radius of a given classifier (called downstream classifier. For instance, the encoder could be pretrained base classifier) for a testing input. BC based certification using supervised learning on a large amount of labeled requires white-box access to the base classifier as it often data or self-supervised learning [1], [2] on a large amount of requires propagating the perturbation from the input layer to unlabeled data. A client could be a smartphone, IoT device, the output layer of the base classifier layer by layer. SC based self-driving car, or edge device in the era of edge computing. In the Standard Encoder as a Service (SEaaS), the smoothed classifier for the testing input. To increase the testing service provides a single API (called Feature-API) for clients inputs' certified radii, SC based certification often requires Wenjie Qu performed this research when he was an intern in Gong's group. Our input-space certified radius R guarantees the certification. However, the client does not have white-box client's base or smoothed downstream classifier predicts the access to the encoder deployed on the cloud server, making same label for the testing input if the l The second challenge perturbation added to the testing input is less than R. is that, although a client can use SC based certification by treating the composition of the encoder and its downstream The key challenge of implementing our F2IPerturb-API is classifier as a base classifier, it incurs a large communication how to find the largest input-space certified radius R for a cost for the client and a large computation cost for the cloud given testing input and its feature-space certified radius R Therefore, the client requires e queries to the Feature-API per training input, problem is challenging to solve due to the highly non-linear where e is the number of epochs used to train the downstream constraint.

artificial intelligence, classifier, machine learning, (18 more...)

2301.02905

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Detommaso, Gianluca, Gasparin, Alberto, Wilson, Andrew, Archambeau, Cedric

Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors

arXiv.org Artificial IntelligenceJul-17-2022

As we move away from the data, the predictive uncertainty should increase, since a great variety of explanations are consistent with the little available information. We introduce Distance-Aware Prior (DAP) calibration, a method to correct overconfidence of Bayesian deep learning models outside of the training domain. We define DAPs as prior distributions over the model parameters that depend on the inputs through a measure of their distance from the training set. DAP calibration is agnostic to the posterior inference method, and it can be performed as a post-processing step. We demonstrate its effectiveness against several baselines in a variety of classification and regression problems, including benchmarks designed to test the quality of predictive distributions away from the data.

calibration, dap calibration, model parameter, (14 more...)

2207.082

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)